Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

✓AI News
AI Tools

Type :

✓AI News
AI Tools

2025-07-18 09:57:18.AIbase

5.63% Error Rate Sets New Low: NVIDIA AI Launches Commercial-Grade Ultra-High-Speed Speech Recognition Model Canary-Qwen-2.5B

NVIDIA's Canary-Qwen-2.5B sets a 5.63% WER record on Hugging Face OpenASR. This CC-BY licensed model combines FastConformer encoder with Qwen3-1.7B LLM decoder for efficient speech-to-text and NLP. Supports multi-GPU deployment for cloud/edge applications.....

2025-07-17 14:18:08.AIbase

New Function of Xiaomi Car: Quick Find Car - Auto Photography + AI Recognition

Xiaomi Cars launches 'Quick Find' feature for YU7 models, using AI-powered image recognition to help locate parked cars via photos and voice queries.....

2025-07-07 17:36:29.AIbase

Stream-Omni: Supports Various Modalities Combination Interaction, Opening the Era of Text, Vision, and Speech Integration

CASIC introduces Stream-Omni, a multimodal model supporting text, vision, and speech. It uses targeted alignment to reduce data dependency, excels in cross-modal tasks, and offers open-source resources.....

2025-07-04 11:13:59.AIbase

Open Source Revolution! Kyutai TTS Launches: Ultra-Low Latency Speech Synthesis, the New Era of AI Voice is Here!

Recently, the French AI laboratory Kyutai announced the official open source of its new text-to-speech model, Kyutai TTS, providing global developers and researchers with a high-performance, low-latency speech synthesis solution. This breakthrough release not only promotes the development of open-source AI technology but also opens up new possibilities for multilingual voice interaction applications. AIbase provides an exclusive analysis of this technological highlight and its potential impact. Ultra-low latency, a new experience in real-time interaction. Kyutai TTS has become an industry standout with its exceptional performance.

2025-07-02 16:19:47.AIbase

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly

2025-07-01 14:07:56.AIbase

TEN VAD Shocks Open Source: Enterprise-Level Speech Detection Tool, Creating a Super Intelligent AI Voice Assistant!

2025-07-01 11:25:55.AIbase

TEN Agent Open Source TEN VAD and Turn Detection Enable Ultra-Low Latency for Speech AI

The TEN Agent team recently announced that its core models **TEN Voice Activity Detection (VAD)** and **TEN Turn Detection** are now open source, providing powerful technical support for building real-time, multimodal speech AI agents. This move marks a significant advancement in the TEN framework's efforts to promote the democratization and open-source collaboration of speech interaction technology. The following is the latest information compiled by AIbase, offering an in-depth analysis of these two core models.

2025-07-01 11:01:49.AIbase

Qwen-TTS Launches with Major Breakthrough in Dialect Speech Synthesis, Realism Comparable to Human Voices

2025-07-01 08:42:27.AIbase

New Release of Qwen-TTS Adds Support for Three Chinese Dialects

Recently, a speech synthesis model called Qwen-TTS has made new progress, with its latest version update completed through the Qwen API, bringing users a richer speech synthesis experience. In this update, Qwen-TTS added support for three Chinese dialects: Beijing dialect, Shanghai dialect, and Sichuan dialect, further expanding its application scenarios. The model is trained on a large-scale corpus of more than 3 million hours, achieving naturalness and expressiveness at a human level. Qwen-TTS can not only accurately

2025-06-25 08:48:03.AIbase

ElevenLabs Launches Mobile App Free Users Get 10 Minutes of Text-to-Speech Credit

2025-06-18 15:55:26.AIbase

Apple's New Speech Technology Takes the Field! 34-Minute 4K Video Transcription Completed in Only 45 Seconds, Speed Exceeds OpenAI by 55%

2025-06-18 11:09:50.AIbase

Apple's new Speech API transcribes at an impressive speed, surpassing OpenAI Whisper by 55%

2025-06-17 11:17:00.AIbase

Comprehensive Review of UntitledPen: Full Analysis of an AI Voice Generation Tool - How to Create Natural Voice Content

This article provides an in-depth review of the UntitledPen AI voice generation tool, analyzing its core features such as its intelligent writing assistant, lifelike voice conversion technology, and multilingual support. It helps content creators, video producers, and marketing experts evaluate the practical value and user experience of this tool.

2025-06-11 16:18:09.AIbase

ByteDance Volcano Engine releases DouBao · Voice Podcast Model and DouBao ・ Real-time Speech Model

2025-06-11 09:08:44.AIbase

Millisecond Recognition of Fatal Conditions: Global First Clinical AI Radiology System Boosts Efficiency by 80%

According to Science and Technology Daily, the world's first generative artificial intelligence radiology system integrated into clinical processes developed by Northwestern University School of Medicine is revolutionizing medical imaging diagnostics. This system can identify life-threatening conditions within milliseconds, providing an innovative solution to the global shortage of radiologists. The AI system has been fully deployed in 12 hospitals affiliated with Northwestern University. In five months of actual application in 2024, the system successfully analyzed nearly 24,000 radiology reports, increasing report generation efficiency by an average of 15.5%, boosting work efficiency for some doctors.

2025-06-10 08:46:30.AIbase

Apple WWDC 2025: iOS 26 Upgrade Visual Intelligence AI Assists Screen Content Recognition

2025-06-06 14:02:57.AIbase

OpenAudio Releases Open Source TTS Model S1-Mini: Super Natural AI Voice Created with 0.5B Parameters

Significant progress has been made in the field of AI voice technology as Fish Audio announces the open sourcing of its new Text-to-Speech (TTS) model, OpenAudio S1-Mini. As a streamlined version of the highly-acclaimed S1 model, S1-Mini has triggered industry discussions due to its lightweight design, high expressiveness, and multi-language support. Key Features: Lightweight and High Performance OpenAudio S1-Mini is a lightweight version distilled from the 4B-parameter S1 model, containing only 0.5B parameters.

2025-06-06 11:39:12.AIbase

ElevenLabs Launches V3 Voice Model: Supports Over 70 Languages and Allows Emotional and Tonal Control via Tags

Global leading AI voice technology company, ElevenLabs, has officially released its latest text-to-speech model, Eleven v3 (Alpha version). It is considered the most expressive AI voice model to date. This breakthrough not only enhances the naturalness and emotional expression of speech synthesis but also provides content creators and developers with more powerful tools to assist in video, audiobook, and multimedia tool development. Technical Breakthrough: More Natural Dialogue and Emotional Expression. Eleven v3 introduces a new

2025-06-06 09:05:34.AIbase

The Strongest AI Voice Is Here! Eleven v3 Alpha Version震撼发布: Can Speak and Act

With the rapid development of artificial intelligence technology, the text-to-speech (TTS) field has reached a new milestone. On June 5, 2025, ElevenLabs officially launched its latest text-to-speech model, Eleven v3 (Alpha version), known as the 'Strongest' TTS model. This model can not only convert text into natural and fluent speech but also simulate tone changes and non-verbal expressions in real conversations through precise emotional control and multi-language support, providing creators and developers with unprecedented voice generation capabilities.

2025-06-03 08:48:11.AIbase

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

5.63% Error Rate Sets New Low: NVIDIA AI Launches Commercial-Grade Ultra-High-Speed Speech Recognition Model Canary-Qwen-2.5B

New Function of Xiaomi Car: Quick Find Car - Auto Photography + AI Recognition

Stream-Omni: Supports Various Modalities Combination Interaction, Opening the Era of Text, Vision, and Speech Integration

Open Source Revolution! Kyutai TTS Launches: Ultra-Low Latency Speech Synthesis, the New Era of AI Voice is Here!

Open Source End-to-End Speech Large Model Step-Audio-AQAA: Understand Audio and Generate Natural Speech Directly

TEN VAD Shocks Open Source: Enterprise-Level Speech Detection Tool, Creating a Super Intelligent AI Voice Assistant!

TEN Agent Open Source TEN VAD and Turn Detection Enable Ultra-Low Latency for Speech AI

Qwen-TTS Launches with Major Breakthrough in Dialect Speech Synthesis, Realism Comparable to Human Voices

New Release of Qwen-TTS Adds Support for Three Chinese Dialects

ElevenLabs Launches Mobile App Free Users Get 10 Minutes of Text-to-Speech Credit

Apple's New Speech Technology Takes the Field! 34-Minute 4K Video Transcription Completed in Only 45 Seconds, Speed Exceeds OpenAI by 55%

Apple's new Speech API transcribes at an impressive speed, surpassing OpenAI Whisper by 55%

Comprehensive Review of UntitledPen: Full Analysis of an AI Voice Generation Tool - How to Create Natural Voice Content

ByteDance Volcano Engine releases DouBao · Voice Podcast Model and DouBao ・ Real-time Speech Model

Millisecond Recognition of Fatal Conditions: Global First Clinical AI Radiology System Boosts Efficiency by 80%

Apple WWDC 2025: iOS 26 Upgrade Visual Intelligence AI Assists Screen Content Recognition

OpenAudio Releases Open Source TTS Model S1-Mini: Super Natural AI Voice Created with 0.5B Parameters

ElevenLabs Launches V3 Voice Model: Supports Over 70 Languages and Allows Emotional and Tonal Control via Tags

The Strongest AI Voice Is Here! Eleven v3 Alpha Version震撼发布: Can Speak and Act

Google Gemini Live Function Officially Lands on iOS Platform, Opening a New AI Recognition Experience